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Abstract—In recent years, graph representation learning has 
undergone a paradigm shift, driven by the emergence and 
proliferation of graph neural networks (GNNs) and their hetero- 
geneous counterparts. Heterogeneous GNNs have shown remark- 
able success in extracting low-dimensional embeddings from com- 
plex graphs that encompass diverse entity types and relationships. 
While meta-path-based techniques have long been recognized for 
their ability to capture semantic affinities among nodes, their 
dependence on manual specification poses a significant limitation. 
In contrast, matrix-focused methods accelerate processing by 
utilizing structural cues but often overlook contextual richness. 
In this paper, we challenge the current paradigm by introducing 
ontology as a fundamental semantic primitive within complex 
graphs. Our goal is to integrate the strengths of both matrix- 
centric and meta-path-based approaches into a unified frame- 
work. We propose perturbation Ontology-based Graph Attention 
Networks (POGAT), a novel methodology that combines ontology 
subgraphs with an advanced self-supervised learning paradigm 
to achieve a deep contextual understanding. The core innovation 
of POGAT lies in our enhanced homogeneous perturbing scheme 
designed to generate rigorous negative samples, encouraging 
the model to explore minimal contextual features more thor- 
oughly. Through extensive empirical evaluations, we demonstrate 
that POGAT significantly outperforms state-of-the-art baselines, 
achieving a groundbreaking improvement of up to 10.78% in 
Fl-score for the critical task of link prediction and 12.01% in 
Micro-F1 for the critical task of node classification. 

Index Terms—Heterogeneous Graph, Graph Neural Networks, 
perturbation Ontology Subgraphs. 


I. INTRODUCTION 


Graphs are a powerful way to represent complex rela- 
tionships among objects, but their high-dimensional nature 
requires transformation into lower-dimensional representations 
through graph representation learning for effective appli- 
cations. The emergence of graph neural networks (GNNs) 
has significantly enhanced this process. While early network 
embedding methods focused on homogeneous graphs, the 
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rise of heterogeneous information networks (HINs) in real- 
world contexts—like citation, biomedical, and social net- 
works—demands the capture of intricate semantic information 
due to diverse interconnections among heterogeneous entities. 
Addressing HIN heterogeneity to maximize semantic capture 
remains a key challenge. 

In HINs, graph representation learning can be classified into 
two main categories: meta-path-based methods and adjacency 
matrix-based methods. Meta-path-based approaches leverage 
meta-paths to identify semantic similarities between target 
nodes, thereby establishing meta-path-based neighborhoods. A 
meta-path is a defined sequence in HINs that links two entities 
through a composite relationship, reflecting a specific type of 
semantic similarity. For instance, in a social HIN comprising 
four node types (User, Post, Tag, Location) and three edge 
types (“interact,” “mark,” “locate”), two notable meta-paths are 
illustrated: UPU and UPTPU. On the other hand, adjacency 
matrix-based methods emphasize the structural relationships 
among nodes, utilizing adjacency matrices to propagate node 
features and aggregate information from neighboring struc- 
tures. 

Both meta-path-based and adjacency matrix-based methods 
have notable limitations. Meta-path-based techniques often 
struggle with selecting effective meta-paths, as the relation- 
ships they represent can be complex and implicit. This makes 
it challenging to identify which paths enhance representation 
learning, especially in HINs, with diverse node and rela- 
tion types. The search space for meta-paths becomes vast 
and exponentially complex, necessitating expert knowledge 
to identify the most relevant paths. A limited selection can 
lead to significant information loss, adversely affecting model 
performance. On the other hand, adjacency matrix-based meth- 
ods focus on structural information from neighborhoods but 
often overlook the rich semantics of HINs. While they can 
be viewed as combinations of 1-hop meta-paths, they lack 
the robust semantic framework needed to effectively capture 


TABLE I 
SUMMARY OF DATASETS (N TYPES: NODE TYPES, E TYPES: EDGE TYPES, 
TARGET: TARGET NODE, AND CLASSES: TARGET CLASSES). 


| # Nodes #N Types #Edges #E Types Target #Classes # Task 

DBLP 26,128 4 119,783 3 author 4 LP&NC 
IMDB-L | 21,420 4 86,642 6 movie 4 NC 
IMDB-S 11,616 3 17,106 2 - - LP 
Freebase 43,854 4 151034 6 movie 3 NC 
AMiner 55,783 3 153,676 4 paper 4 LP&NC 
Alibaba 22,649 3 45,734 5 - - LP 


implicit semantic information, leading to further information 
loss. 

To address these challenges, we propose using HIN rep- 
resentation learning based on Ontology [1], which compre- 
hensively describes entity types and relationships. Ontology 
models a world of object types, attributes, and relationships 
[2], emphasizing its semantic properties. Since HINs are 
semantic networks constructed based on Ontology, we assert 
that Ontology provides all necessary semantic information. We 
define a minimal HIN subgraph that aligns with all possible 
ontology descriptions as an ontology subgraph. An HIN can 
be seen as a concatenation of these ontology subgraphs, 
which offer a complete context for nodes, representing the 
minimal complete context of each node. Nodes within an 
ontology subgraph are considered ontology neighbors, forming 
a local complete context. Compared to meta-paths, ontology 
subgraphs encompass richer semantics, capturing all node and 
relation types along with complete context, while meta-paths 
are limited in scope. Although meta-paths are based on On- 
tology, ontology subgraphs can capture semantic similarities 
to some extent. Importantly, the structure of an ontology 
subgraph is predefined, requiring only a search rather than 
manual design. In contrast to adjacency matrices, ontology 
subgraphs represent the smallest complete semantic units with 
rich semantic information and also provide structural insights 
due to their natural graph structure. In summary, Ontology 
combines the strengths of both meta-paths and adjacency 
matrices. 

In this paper, we present Perturbation Ontology-based 
Graph Attention Networks (POGAT) for graph representation 
learning that leverages ontology. To improve node context 
representation, we aggregate both intra-ontology and inter- 
ontology subgraphs. Our self-supervised training incorporates 
a perturbation strategy, enhanced by homogeneous node re- 
placement to generate hard negative samples, which helps 
the model capture more nuanced node features. Experimental 
results demonstrate that our method surpasses several existing 
approaches, achieving state-of-the-art performance in both link 
prediction and node classification tasks. 


II. METHODS 


With ontology subgraphs as the fundamental semantic build- 
ing blocks, this section aims to develop a contextual represen- 
tation of nodes using these subgraphs. Next, we will design 


training tasks for the network by perturbing the ontology 
subgraphs. 

First of all, we prepare the input node and edge embeddings 
within an ontology subgraph to be passed to the Graph 
Transformer Layer (similar to [4]). For an Ontology sub-graph 
G with node features a; € R%*! for each node i and edge 
features pij € RIeX1 for each edge between node i and node 
j, the input node features a; and edge features 6; are passed 
via a linear projection to embed these to d-dimensional hidden 
features h? and e$. 


h? = Aa; +a? ; ep = B? Bi; +0, (1) 


where A? € RIXdn, BO e RIXde and a?,b? € RÌ are the 
parameters of the linear projection layers. We then embed 
the pre-computed node positional encodings of dim k using a 
linear projection and add to the node features.” ĝÎ?. 


N= CA, +0 ; no = A H, (2) 


Next, we will define the node update equations for layer £. 


ae? = Of D u v), 3) 
JENG 
kept . KELKE 
where, wi = softmax,; @ g l i (4) 
v dk 


and Q£, KHE, VEE e R&Xd, OF €e RAG k = 1 to 
H denotes the number of attention heads, and || denotes 
concatenation. 

To ensure numerical stability, the outputs after exponenti- 
ating the terms inside the softmax are clamped between —5 
to +5. The attention outputs Aft are then passed to a Feed 
Forward Network, which is preceded and followed by residual 
connections and normalization layers, as follows: 


Aft = LayerNorm( hi + i) , (5) 
ÂH — wéreLucweh!t, (6) 
= LayerNorm(h{*! + ni), (7) 


where W£, € R22*4, WE, e RIX24, he, Her denote inter- 
mediate representations. The bias terms are omitted for clarity. 

Given that each ontology subgraph ©; associated with 
the target nodew independently yields an intra-aggregation 
representation, it becomes imperative to integrate the rich 
semantic information emanating from each of these subgraphs 
within the broader network M via an inter-aggregation pro- 
cess. Considering the minimal context semantic should be 
equivalent to each other, we turn to use multi-head attention 
mechanisms to aggregate the semantic information between 
ontology subgraphs: 


h® = ConCat(o(h0-%)), (8) 


PERFORMANCE EVALUATION ON NODE CLASSIFICATION. 


TABLE II 


In this table, tabular results are in percent; the best result is bolded. 


Methods f DBLP f IMDB-S , Freebase l AMiner 
Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1 
GAT [3] 93.39 +0.30 93.83 40.27 | 64.86 +0.43 58.94 +1.35 69.04 +0.58 59.28 +2.56 | 84.92 +0.68 74.32 +0.95 
Transformer [4] 93.99 +0.11 93.48 +0.12 | 66.29 +0.69 62.79 +0.65 67.89 +0.39 63.35 +0.46 | 85.72 +0.43 74.15 +0.28 
RGCN [5] 92.07 +0.50 91.52 +0.50 | 62.95 +0.15 58.85 +0.26 | 60.82 +1.23 59.08 +1.44 | 81.58 +1.44 62.53 +2.31 
HetGNN [6] 92.33 +0.41 91.76 +0.43 | 51.16 +0.65 48.25 +0.67 | 62.99 +2.31 58.44 +1.99 | 72.34 +1.42 55.42 +1.45 
HAN [7] 92.05 +0.62 91.67 +0.49 | 64.63 +0.58 57.74 +0.96 | 61.42 +3.56 57.05 +2.06 | 81.90 +1.51 64.67 +2.21 
GTN [8] 93.97 +0.54 93.52 +0.55 | 65.14 +0.45 60.47 +0.98 - - - - 
MAGNN [9] 93.76 +0.45 93.28 +0.51 64.67 +1.67 56.49 +3.20 | 64.43 +0.73 58.18 +3.87 82.64 +1.59 68.60 +2.04 
RSHN [10] 93.81 +0.55 93.34 +0.58 | 64.22 +1.03 59.85 +3.21 61.43+5.37 57.37 +1.49 | 73.33 +2.71 51.48 +4.20 
HetSANN [11] 80.56 +1.50 78.55 +2.42 | 57.68 +0.44 49.47 +1.21 - - - - 
HGT [12] 93.49 +0.25 93.01 +0.23 | 67.20 +0.57 63.00 +1.19 | 66.43 +1.88 60.03 +2.21 85.74 +1.24 74.98 +1.61 
SimpleHGN [13] 94.46 +0.22 94.01 +0.24 | 67.36 +0.57 63.53 +1.36 | 67.49 +0.97 62.49 +1.69 | 86.44 +0.48 75.73 +0.97 
HINormer [14] 94.94 +0.21 94.57 +0.23 | 67.83 +0.34 64.65 +0.53 | 69.42 +0.63 63.93 +0.59 | 88.04 +0.12 79.88 +0.24 
POGAT 96.71 +0.25 96.21 +0.22 | 74.33 +0.35 72.42 +0.37 | 74.12 +0.49 72.74 +0.47 | 93.37 +0.13 88.24 +0.28 
TABLE III 
MODEL PERFORMANCE COMPARISON FOR THE TASK OF LINK PREDICTION ON DIFFERENT DATASETS. 
Method AMiner Alibaba IMDB-L DBLP 
R-AUC  PR-AUC F1 R-AUC PR-AUC F1 R-AUC PR-AUC F1 R-AUC PR-AUC Fl 
node2vec [15] 0.594 0.663 0.602 0.614 0.580 0.593 0.479 0.568 0.474 0.449 0.452 0.478 
RandNE [16] 0.607 0.630 0.608 0.877 0.888 0.826 0.901 0.933 0.839 0.492 0.491 0.493 
FastRP [17] 0.620 0.634 0.600 0.927 0.900 0.926 0.869 0.893 0.811 0.515 0.528 0.506 
SGC [18] 0.589 0.585 0.567 0.686 0.708 0.623 0.826 0.889 0.769 0.601 0.606 0.587 
R-GCN [5] 0.599 0.601 0.610 0.674 0.710 0.629 0.826 0.878 0.790 0.589 0.592 0.566 
MAGNN [9] 0.663 0.681 0.666 0.961 0.963 0.948 0.912 0.923 0.887 0.690 0.699 0.684 
HPN [19] 0.658 0.664 0.660 0.958 0.961 0.950 0.900 0.903 0.892 0.692 0.710 0.687 
PMNE-r [20] 0.613 0.635 0.657 0.597 0.591 0.664 0.651 0.634 0.630 0.622 0.625 0.609 
MNE [21] 0.660 0.672 0.681 0.944 0.946 0.901 0.688 0.701 0.681 0.657 0.660 0.635 
GATNE [22] OOT OOT OOT 0.981 0.986 0.952 0.872 0.878 0.791 OOT OOT OOT 
DMGIT [23] OOM OOM OOM 0.857 0.781 0.784 0.926 0.935 0.873 0.610 0.615 0.601 
FAME [24] 0.687 0.747 0.726 0.993 0.996 0.979 0.944 0.959 0.897 0.642 0.650 0.633 
DualHGNN [25] / / / 0.974 0.977 0.966 / / / / / / 
MHGCN [26] 0.711 0.753 0.730 0.997 0.997 0.992 0.967 0.966 0.959 0.718 0.722 0.703 
BPHGNN [27] 0.723 0.762 0.723 0.995 0.996 0.994 0.969 0.965 0.943 0.726 0.734 0.731 
POGAT 0.804 0.812 0.801 0.998 0.997 0.994 0.967 0.986 0.975 0.838 0.819 0.803 
Std. 0.012 0.014 0.011 0.011 0.010 0.011 0.012 0.013 0.012 0.013 0.021 0.012 


OOT: Out Of Time (36 hours). OOM: Out Of Memory; DMGI runs out of memory on the entire AMiner data. R-AUC: ROC-AUC. 


where k is the number of attention heads, Concat(-) denotes 
the concatenation of vectors, and we obtain the representation 
of the last layer by averaging operation: 


1 
hP =—S nh), 
K k 


A. Bi-level perturbation Ontology Training 


(9) 


To enhance the model’s ability to capture the intrinsic 
semantics of ontology, we employ a perturbation technique 
to modify the ontology. We also design two specific tasks to 
differentiate perturbation subgraphs at both the node level and 
the graph level. 

1) Ontology Subgraph perturbation: 

In this section, we enhance the perturbation operation on ontol- 
ogy subgraphs to generate negative samples for self-supervised 


tasks. Initially, we tried the common all-zero mask, which 
replaces node embeddings with zero vectors, but this approach 
yielded unsatisfactory results. Drawing inspiration from [28], 
which used random graphs as noise distributions, we then 
implemented a random mask that selects nodes randomly for 
substitution, resulting in some improvement. However, given 
the significant differences in information among various node 
types, using random nodes can create negative samples that 
are too dissimilar to the positive samples, making the task 
easier and potentially reducing model performance. To address 
this, we further refined our strategy by substituting nodes 
with similar types, thereby constructing challenging negative 
samples that enhance the model’s ability to learn from minimal 
contexts. 


We take the ontology subgraph set (i.e., Osup) as positive 


samples. Then, we randomly replaced nodes in the subgraphs 
with nodes of the same type to preserve a certain level of 
semantics similarity. These substitute nodes are marked with 
diagonal lines. If the generated perturbation subgraph is not 
included in the original ontology subgraph set, it is labeled as 
a negative sample and denoted as 7". The set of all negative 
ontology subgraphs is denoted as OF. Next, we perform 
shuffle operations on all positive and negative samples, further 
readout the context representations of nodes to obtain a graph- 
level representations of O,: 


ho’ = ReadOut(h, | Vu € Oj, O; € Ou» UO™,) (10) 


2) Graph-level Discrimination: 
For graph-level training, we designed a graph discriminator 
based on an MLP with to determine whether the subgraph has 
been perturbed: 


Yprea,G = Discriminatorg (ne! ) (11) 
Then we calculate the cross-entropy loss: 
Le = 5 CROSSENT (Ypred,G; Yirue,G) ; (12) 


O; 
where Ytrue,g stands for the labels of graph-level task. 
3) Node-level Discrimination: 
Given the node representation h, for node v, we further 
employ an MLP ¢mp(-; Opat) parameterized by Opat to predict 
the class distribution as follows, 


Yu = dup (hy; Opat) (13) 


where y, € RŪ is the prediction and C is the number of 
classes. In addition, we further add an Lə normalization on 
y, for stable optimization. 

Given the training nodes V,,, for multi-class node classifi- 
cation, we employ cross-entropy as the overall loss, as 


Ly = X CROSSENT(¥y, Yv), 
ve Vir 


(14) 


where CROSSENT(-) is the cross-entropy loss, and y, € RŪ 
is the one-hot vector that encodes the label of node v. Note 
that, for multi-label node classification, we can employ binary 
cross-entropy to calculate the overall loss. 

Finally, we performed joint training on both tasks, allowing 
our model to learn minimal context semantics from both 
graph-level and node-level perspectives. We optimized the 
model by minimizing the final objective function: 


L=q: Ly + (1-7): La, 


where y € [0,1] is a balance scalar. 


(15) 


II. EXPERIMENTS 


In this section, we perform a comprehensive set of exper- 
iments to assess the effectiveness of our proposed method, 
POGAT, specifically targeting node classification and link 
prediction tasks. Our goal is to showcase the superiority of 
POGAT by comparing its performance with existing state-of- 
the-art methods. 


A. Datasets. 


Our experimental evaluation spans across six publicly 
available, real-world datasets: IMDB-L (datasetl), IMDB- 
S (dataset2), Alibaba (dataset3), DBLP (dataset4), Freebase 
(dataset5), and Aminer (dataset6). A concise summary of each 
dataset’s statistical properties is provided in Table 1. For all 
baselines, we use their released source code and the parameters 
recommended by their papers to ensure that their methods 
achieve the desired effect. 


B. Node classification. 


We conduct a comprehensive evaluation of our model’s 
efficacy in node classification tasks by comparing it against 
state-of-the-art baselines. The results of this evaluation are 
detailed in Table 2, where the best scores are highlighted in 
bold for clarity and emphasis. Our proposed POGAT model 
demonstrates a remarkable performance advantage, signifi- 
cantly surpassing all baseline models in both Macro-F1 and 
Micro-Fl metrics across a diverse range of heterogeneous 
networks. This robust performance indicates the effectiveness 
of our approach in capturing the underlying structures and 
relationships within the data. For DBLP and IMDB-S, we 
leverage standard settings and benchmark against the HGB 
leaderboard results. For the remaining datasets, we adhere 
strictly to the default hyperparameter settings of the base- 
line models. Furthermore, we fine-tune these hyperparameters 
based on validation performance to optimize the results. 


C. Link prediction. 


Next, we evaluate POGAT’s performance in unsupervised 
link prediction against leading baselines. The results of this 
evaluation are comprehensively summarized in Table 3, which 
provides a clear illustration of the model’s effectiveness across 
various tested networks. Our findings reveal that POGAT 
achieves state-of-the-art metrics in link prediction, showcasing 
its capability to effectively identify and predict connections 
within complex network structures. Notably, POGAT demon- 
strates an average improvement of 5.92%, 5.42% and 5.54% 
in R-AUC, PR-AUC, and F1, respectively, over the GNN 
MHGCN on six datasets. 


IV. CONCLUSION 


In conclusion, this research addresses the challenges of 
heterogeneous network embedding through the introduction 
of Ontology. We present perturbation Ontology-based Graph 
Attention Networks, a novel approach that integrates on- 
tology subgraphs with an advanced self-supervised learning 
framework to achieve a deeper contextual understanding. Ex- 
perimental results on six real-world heterogeneous networks 
demonstrate the effectiveness of POGAT, showcasing its su- 
periority in both node classification and link prediction tasks. 
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